Introduction to Data-Oriented Parsing

نویسندگان

  • Rens Bod
  • Remko Scha
  • Günter Neumann
چکیده

We present HPSG–DOP, a method for automatically extracting a Stochas-tic Lexicalized Tree Grammar (SLTG) from a HPSG source grammar and a given corpus. 1 Processing of a SLTG is performed by a specialized fast parser. The approach has been tested on a large English grammar and has been shown to achieve additional performance increase compared to parsing with a highly tuned HPSG parser. Our approach is simple and transparent. The extracted grammars are declaratively represented and have a high degree of practical applicability. Head Driven Phrase Structure Grammar (HPSG) has proven to be a quite successful formalism for specifying natural language grammars in a highly modular and compact manner (Pollard and Sag 1994) supporting the definition of complex linguistic information and interactions between information on different strata systematically using typed feature constraints. On the other hand, inefficiency in processing such grammars is the major obstacle for using the HPSG formalism in practical NL applications (Makino et al. 1998).

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

A memory-based model of syntactic analysis: data-oriented parsing

This paper presents a memory−based model of human syntactic processing: Data−Oriented Parsing. After a brief introduction (section 1), it argues that any account of disambiguation and many other performance phenomena inevitably has an important memory−based component (section 2). It discusses the limitations of probabilistically enhanced competence−grammars, and argues for a more principled mem...

متن کامل

An improved joint model: POS tagging and dependency parsing

Dependency parsing is a way of syntactic parsing and a natural language that automatically analyzes the dependency structure of sentences, and the input for each sentence creates a dependency graph. Part-Of-Speech (POS) tagging is a prerequisite for dependency parsing. Generally, dependency parsers do the POS tagging task along with dependency parsing in a pipeline mode. Unfortunately, in pipel...

متن کامل

A Data-Oriented Approach to Semantic Interpretation

In Data-Oriented Parsing (DOP), an annotated language corpus is used as a stochastic grammar. The most probable analysis of a new input sentence is constructed by combining sub-analyses from the corpus in the most probable way. This approach has been succesfully used for syntactic analysis, using corpora with syntactic annotations such as the Penn Treebank. If a corpus with semantically annotat...

متن کامل

تأثیر ساخت‌واژه‌ها در تجزیه وابستگی زبان فارسی

Data-driven systems can be adapted to different languages and domains easily. Using this trend in dependency parsing was lead to introduce data-driven approaches. Existence of appreciate corpora that contain sentences and theirs associated dependency trees are the only pre-requirement in data-driven approaches. Despite obtaining high accurate results for dependency parsing task in English langu...

متن کامل

Aspects Of Pattern-Matching In Data-Oriented Parsing

Data-Oriented Parsing (DOP) ranks mnong the best parsing schemes, pairing state-of-the art parsing accuracy to the psycholinguistic insight that larger clmnks of syntactic structures are relevant grammatical and probabilistic units. Parsing with the DOp-model~ however, seems to involve a lot of CPU cycles and a considerable amomtt of double work, brought on by the concept of multiple derivation...

متن کامل

Darwinised Data-Oriented Parsing - Statistical NLP with Added Sex and Death

We present the Darwinised DataOriented Parsing algorithm, an incremental, dy-namic form of Data-Oriented Parsing, in which exemplars are used as replicators, subject to a selection pressure towards gen-eralisability.

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2003